Optimal bandwidth selection for re-substitution entropy estimation

نویسندگان

  • Yu-Lin He
  • James Nga-Kwok Liu
  • Xizhao Wang
  • Yan-Xing Hu
چکیده

A new fusion approach of selecting an optimal bandwidth for re-substitution entropy estimator (RE) is presented in this study. When approximating the continuous entropy with density estimation, two types of errors will be generated: entropy estimation error (type-I error) and density estimation error (type-II error). These two errors are all strongly dependent on the undetermined bandwidths. Firstly, an experimental conclusion based on 24 typical probability distributions is demonstrated that there is some inconsistency between the optimal bandwidths associated with these two errors. Secondly, two different error measures for type-I and type-II errors are derived. A trade-off between type-I and type-II errors is a fundamental and potential property of our proposed method called REIþII . Thus, the fusion of these two errors is conducted and an optimal bandwidth for REIþII is solved. Finally, the experimental comparisons are carried out to verify the estimation performance of our proposed strategy. The discretization method is deemed to be the necessary preprocessing technology for the calculation of continuous entropy traditionally. So, the nine mostly used unsupervised discretization methods are introduced to give comparison of their computational performances with that of REIþII . And, five most popular estimators for entropy approximation are also plugged into our comparisons: splitting data estimator (SDE), cross-validation estimator (CVE), m-spacing estimator (mSE), mn-spacing estimator (mnSE), and nearest neighbor distance estimator (NNDE). The simulation studies on 24 different typical density distributions show that REIþII can obtain the better estimation performance among the involved methods. Meanwhile, the estimation behaviors of different entropy estimation methods are also revealed based on the comparative results. The empirical analysis demonstrates that REIþII is more insensitive to data and a better generalizable way for the estimation of continuous entropy. REIþII makes it possible for a handy optimal bandwidth to be derived from a given dataset. 2012 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

Modeling of the Maximum Entropy Problem as an Optimal Control Problem and its Application to Pdf Estimation of Electricity Price

In this paper, the continuous optimal control theory is used to model and solve the maximum entropy problem for a continuous random variable. The maximum entropy principle provides a method to obtain least-biased probability density function (Pdf) estimation. In this paper, to find a closed form solution for the maximum entropy problem with any number of moment constraints, the entropy is consi...

متن کامل

A non-asymptotic bandwidth selection method for kernel density estimation of discrete data

In this paper we explore a method for modeling of categorical data derived from the principles of the Generalized Cross Entropy method. The method builds on standard kernel density estimation techniques by providing a novel non-asymptotic data-driven bandwidth selection rule. In addition to this, the Entropic approach provides model sparsity not present in the standard kernel approach. Numerica...

متن کامل

Bandwidth Selection in Kernel Density Estimation: a Review

Allthough nonparametric kernel density estimation is nowadays a standard technique in explorative data{analysis, there is still a big dispute on how to assess the quality of the estimate and which choice of bandwidth is optimal. The main argument is on whether one should use the Integrated Squared Error or the Mean Integrated Squared Error to deene the optimal bandwidth. In the last years a lot...

متن کامل

Asymptotics and Optimal Bandwidth Selection for Highest Density Region Estimation1 by R. J. Samworth

We study kernel estimation of highest-density regions (HDR). Our main contributions are two-fold. First, we derive a uniform-in-bandwidth asymptotic approximation to a risk that is appropriate for HDR estimation. This approximation is then used to derive a bandwidth selection rule for HDR estimation possessing attractive asymptotic properties. We also present the results of numerical studies th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Applied Mathematics and Computation

دوره 219  شماره 

صفحات  -

تاریخ انتشار 2012